Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval
نویسنده
چکیده
This paper concerns document ranking in information retrieval. In information retrieval systems, the widely accepted probability ranking principle (PRP) suggests that, for optimal retrieval, documents should be ranked in order of decreasing probability of relevance. In this paper, we present a new document ranking paradigm, arguing that a better, more general solution is to optimize top-n ranked documents as a whole, rather than ranking them independently. Inspired by the Modern Portfolio Theory in finance, we quantify a ranked list of documents on the basis of its expected overall relevance (mean) and its variance; the latter serves as a measure of risk, which was rarely studied for document ranking in the past. Through the analysis of the mean and variance, we show that an optimal rank order is the one that maximizes the overall relevance (mean) of the ranked list at a given risk level (variance). Based on this principle, we then derive an efficient document ranking algorithm. It extends the PRP by considering both the uncertainty of relevance predictions and correlations between retrieved documents. Furthermore, we quantify the benefits of diversification, and theoretically show that diversifying documents is an effective way to reduce the risk of document ranking. Experimental results on the collaborative filtering problem confirms the theoretical insights with improved recommendation performance, e.g., achieved over 300% performance gain over the PRP-based ranking on the user-based recommendation.
منابع مشابه
Back to the Roots: Mean-Variance Analysis of Relevance Estimations
Recently, mean-variance analysis has been proposed as a novel paradigm to model document ranking in Information Retrieval. The main merit of this approach is that it diversifies the ranking of retrieved documents. In its original formulation, the strategy considers both the mean of relevance estimates of retrieved documents and their variance. However, when this strategy has been empirically in...
متن کاملInvestigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval
Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملAccounting for Stability of Retrieval Algorithms using Risk-Reward Curves
Past evaluation of information retrieval algorithms has focused largely on achieving good average performance, without much regard for the stability or variance of retrieval results across queries. In fact, two algorithms that superficially appear to have equally desirable average precision performance can have very different stability or risk profiles. A prime example comes from query expansio...
متن کامل